-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve nanosecond resolution when encoding/decoding times #7827
Conversation
8aa6983
to
848ac09
Compare
@spencerkclark I'd appreciate if you could have a look here. All but one test pass, but I can't immediately see what that test is doing. Looks like mismatched dtypes on the attributes. If you have any suggestions how to possibly improve, please let me know. I've not added tests here, yet. |
848ac09
to
bb99536
Compare
I've reset the order of coders to the initial behaviour. Instead the times are special cased in the CFMaskCoder. Locally it works, but I'll only trust the CI. |
85c318c
to
b304aa0
Compare
All tests have passed. Rebased now on latest main. The issue described in #7817 is resolved. Ready for first reviews. |
Thanks @kmuehlbauer -- I just wanted to give you a heads up that I'm pretty busy this week. Hopefully I'll get a free moment to look at this more closely next week. |
Thanks for the heads-up, @spencerkclark. No worries, I need to apply some changes anyway as it turns out. |
5648c0f
to
584b46e
Compare
584b46e
to
8750475
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kmuehlbauer -- I'm catching up here a little bit. I think if we want to swap the order of decoding datetimes / timedeltas with decoding the mask information, we will also need to decode the _FillValue
into something that has a datetime or timedelta type. We may not have great test coverage in this area.
As a concrete example I'm thinking about something like the following:
>>> import numpy as np; import xarray as xr
>>> times = [np.datetime64("2000-01-01"), np.datetime64("NaT")]
>>> da = xr.DataArray(times, dims=["time"], name="foo")
>>> da.encoding["dtype"] = np.float64
>>> da.encoding["_FillValue"] = 20.0
>>> da.to_dataset().to_netcdf("test-encode.nc")
On the current main branch we get the following behavior:
>>> xr.open_dataset("test-encode.nc").foo.values
array(['2000-01-01T00:00:00.000000000', 'NaT'],
dtype='datetime64[ns]')
>>> xr.open_dataset("test-encode.nc", decode_cf=False).foo.values
array([ 0., 20.])
With this branch we get:
>>> xr.open_dataset("test-encode.nc").foo.values
array(['2000-01-01T00:00:00.000000000', 'NaT'],
dtype='datetime64[ns]')
>>> xr.open_dataset("test-encode.nc", decode_cf=False).foo.values
array([ 0., nan])
Note the difference in the case where decode_cf=False
. It seems like with this branch NaN
is written out to disk when the fill value prescribes 20.0
, which is incorrect. I think maybe the fact that NaN
is being written out to disk is hiding the issue that the _FillValue
needs to be decoded?
(There is probably a way to test this without doing I/O, but this was the quickest example that came to mind).
Thanks @spencerkclark for taking the time. NaN has been written to disk (as you assumed). Let's have another try next week. |
@spencerkclark With current master I get the following
/home/kai/miniconda/envs/xarray_311/lib/python3.11/site-packages/xarray/coding/times.py:618: RuntimeWarning: invalid value encountered in cast
int_num = np.asarray(num, dtype=np.int64)
/home/kai/miniconda/envs/xarray_311/lib/python3.11/site-packages/xarray/coding/times.py:254: RuntimeWarning: invalid value encountered in cast
flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype(
/home/kai/miniconda/envs/xarray_311/lib/python3.11/site-packages/xarray/coding/times.py:254: RuntimeWarning: invalid value encountered in cast
flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype( The latter was discussed in #7098 (casting float64 to int64), the former was aimed to be resolved with this PR. I'll try to create a test case using |
The example below is only based on Variable and the cf encode/decode variable functions. import xarray as xr
import numpy as np
# create DataArray
times = [np.datetime64("2000-01-01", "ns"), np.datetime64("NaT")]
da = xr.DataArray(times, dims=["time"], name="foo")
da.encoding["dtype"] = np.float64
da.encoding["_FillValue"] = 20.0
# extract Variable
source_var = da.variable
print("---------- source_var ------------------")
print(source_var)
print(source_var.encoding)
# encode Variable
encoded_var = xr.conventions.encode_cf_variable(source_var)
print("\n---------- encoded_var ------------------")
print(encoded_var)
# decode Variable
decoded_var = xr.conventions.decode_cf_variable("foo", encoded_var)
print("\n---------- decoded_var ------------------")
print(decoded_var.load()) /home/kai/miniconda/envs/xarray_311/lib/python3.11/site-packages/xarray/coding/times.py:618: RuntimeWarning: invalid value encountered in cast
int_num = np.asarray(num, dtype=np.int64)
/home/kai/miniconda/envs/xarray_311/lib/python3.11/site-packages/xarray/coding/times.py:254: RuntimeWarning: invalid value encountered in cast
flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype(
/home/kai/miniconda/envs/xarray_311/lib/python3.11/site-packages/xarray/coding/times.py:254: RuntimeWarning: invalid value encountered in cast
flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype(
---------- source_var ------------------
<xarray.Variable (time: 2)>
array(['2000-01-01T00:00:00.000000000', 'NaT'],
dtype='datetime64[ns]')
{'dtype': <class 'numpy.float64'>, '_FillValue': 20.0}
dtype num float64
---------- encoded_var ------------------
<xarray.Variable (time: 2)>
array([ 0., 20.])
Attributes:
units: days since 2000-01-01 00:00:00
calendar: proleptic_gregorian
_FillValue: 20.0
---------- decoded_var ------------------
<xarray.Variable (time: 2)>
array(['2000-01-01T00:00:00.000000000', 'NaT'],
dtype='datetime64[ns]')
{'_FillValue': 20.0, 'units': 'days since 2000-01-01 00:00:00', 'calendar': 'proleptic_gregorian', 'dtype': dtype('float64')} |
Great, yeah, that's a nice example without writing to disk. Indeed I saw those warnings too, but omitted them in my earlier message to focus on the encoding issue (sorry about that). I agree that these are something we should address. |
One other tricky edge case that occurs to me is one where an extreme fill value (e.g. |
94a7652
to
db94bb9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful @kmuehlbauer -- I appreciate those additional tests, the additional warning, and the further cleanup. One more suggestion for a minor tweak to a test, and I think this is ready for a what's new entry. Huge thanks for taking on this gnarly issue!
Thanks all for pushing this to a mergeable state. @spencerkclark 👍 💯 |
Co-authored-by: Spencer Clark <spencerkclark@gmail.com>
Co-authored-by: Spencer Clark <spencerkclark@gmail.com>
Amazing work, @kmuehlbauer and @spencerkclark . Thanks! |
) * preserve nanosecond resolution when encoding/decoding times. * Apply suggestions from code review Co-authored-by: Spencer Clark <spencerkclark@gmail.com> * use emit_user_level_warning * move time alignment for nc3 to encode_nc3_variable * fix test for encode_cf_timedelta * fix CFMaskCoder for time-like (also allow timedelta64), add first tests * rename to _unpack_time_units_and_ref_date as suggested in review * refactor delta -> time_units as suggested in review * refactor out function _time_units_to_timedelta64, reorder flow and remove unneeded checks, apply filterwarnings, adapt tests * import _is_time_like from coding.variables * adapt tests, add _numpy_to_netcdf_timeunit-conversion function * adapt tests, add _numpy_to_netcdf_timeunit-conversion function * adapt test as per review, remove arm_xfail for backend test * add whats-new.rst entry * Update doc/whats-new.rst Co-authored-by: Spencer Clark <spencerkclark@gmail.com> * Update doc/whats-new.rst Co-authored-by: Spencer Clark <spencerkclark@gmail.com> * fix whats-new.rst --------- Co-authored-by: Spencer Clark <spencerkclark@gmail.com>
Awesome contribute @kmuehlbauer :) |
xarray.coding.times.cast_to_int_if_safe
#7942,whats-new.rst
api.rst
Closes #7098 as not needed, but worked around to preserve the fast int-based timedelta calculation.